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2 A compact row storage scheme for Cholesky factors using elimination trees 
Joseph W. Liu 

June 1986 ACM Transactions on Mathematical Software (TOMS), volume 12 issue 2 

Additional Information: full citation , abstract , references , citings , index terms, 
review 


Full text available: ^ pdf(1.47 MB) 


For a given sparse symmetric positive definite matrix, a compact row-oriented storage scheme 
for its Cholesky factor is introduced. The scheme is based on the structure of an elimination tree 
defined for the given matrix. This new storage scheme has the distinct advantage of having the 
amount of overhead storage required for indexing always bounded by the number of nonzeros in 
the original matrix. The structural representation may be viewed as storing the minimal 
structure of the given matr ... 

3 Optimization of parser tables for portable compilers 
Peter Dencker, Karl Durre, Johannes Heuft 

October 1984 ACM Transactions on Programming Languages and Systems (TOPLAS), volume 
6 Issue 4 

Full text available: ^ pdfd .53 MB) Additional Information: full citation , references , citings , index terms , review 


4 Technique for automatically correcting words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), Volume 24 issue 4 

Full text available* fSil odf(6 23 MB) Additional Information: full citation, abstract , references, citings, index terms . 
*^ review 

Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and n- 
gram analysis techniques have been developed for detecting strings that do not appear in a 
given word list. In response to the second problem, a variety of general and application-specific 
spelling cor ... 

Keyw rds: n-gram analysis, Optical Character Recognition (OCR), context-dependent spelling 
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5 Optimizing memory usage in the polyhedral model 
Fabien Quillere, Sanjay Rajopadhye 

September 2000 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 22 Issue 5 

Full text available' fifl Pdf(41 1 82 KB) Additional Information: full citation , abstract , references , citings , index terms . 
■ la v : review 

The polyhedral model provides a single unified foundation for systolic array synthesis and 
automatic parallelization of loop programs. We investigate the problem of memory reuse when 
compiling Alpha (a functional language based on this model). Direct compilation would require 
unacceptably large memory (for example 0(n3) for matrix multiplication). Researchers have 
previously addressed the problem of memory reuse, and the analysis that t ... 

Keywords: affine recurrence equations, applicative (functional) languages, automatic 
parallelization, data-parallel languages, dataflow analysis, dependence analysis, lifetime 
analysis, memory management, parallel code generation, polyhedral model, scheduling 


6 Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies on 
Collaborative research 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of the 
application. The visualization tool we use is Poet, an event tracer developed at the University of 
Waterloo. However, these diagrams are often very complex and do not provide the user with the 
desired overview of the application. In our experience, such tools display repeated occurrences 
of non-trivial commun ... 


Next-generation generic programming and its application to sparse matrix computations 
Nikolay Mateev, Keshav Pingali, Paul Stodghill, Vladimir Kotlyar 

May 2000 Proceedings of the 14th international conference on Supercomputing 

Full text available: ^ pdf(1.06 MB) Additional Information: full citation, abstract , references , citings , index terms 

The contributions of this paper are the following. We introduce a new variety of generic 
programming in which algorithm implementors use a different API than data structure designers, 
the gap between the API's being bridged by restructuring compilers. One view of this approach 
is that it exploits restructuring compiler technology to perform a novel kind of template 
instantiation. We demonstrate the usefulness of this new generic programming technology ... 


8 Clustering: Document clustering based on non-negative matrix factorization 
Wei Xu, Xin Liu, Yihong Gong 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on 
Research and development in informaion retrieval 

Full text available: ^pdf(216.50 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we propose a novel document clustering method based on the non-negative 
factorization of the term-document matrix of the given document corpus. In the latent semantic 
space derived by the non-negative matrix factorization (NMF), each axis captures the base topic 
of a particular document cluster, and each document is represented as an additive combination 
of the base topics. The cluster membership of each document can be easily determined by 
finding the base topic (the axis) with w ... 

Keywords: document clustering, non-negative matrix factorization 
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9 Inverted files versus signature files for text indexing Q 
Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao 

December 1998 ACM Transacti ns on Database Systems (TODS), volume 23 issue 4 

Full text available: ^ pdf(243.62 KB) Additional Information: full citation , abstract , references , citings , index terms 

Two well-known indexing methods are inverted files and signature files. We have undertaken a 
detailed comparison of these two approaches in the context of text indexing, paying particular 
attention to query evaluation speed and space requirements. We have examined their relative 
performance using both experimentation and a refined approach to modeling of signature files, 
and demonstrate that inverted files are distinctly superior to signature files. Not only can 
inverted files be used to ev ... 

Keywords: indexing, inverted files, performance, signature files, text databases, text indexing 


10 The design and implementation of a new out-of-core sparse cholesky factorization method Q 
Vladimir Rotkin, Sivan Toledo 

March 2004 ACM Transactions on Mathematical Software (TOMS), Volume 30 issue l 
Full text available: ^ pdf(457.74 KB) Additional Information: full citation , abstract , references , index terms 

We describe a new out-of-core sparse Cholesky factorization method. The new method uses the 
elimination tree to partition the matrix, an advanced subtree-scheduling algorithm, and both 
right-looking and left-looking updates. The implementation of the new method is efficient and 
robust. On a 2 GHz personal computer with 768 MB of main memory, the code can easily factor 
matrices with factors of up to 48 GB, usually at rates above 1 Gflop/s. For example, the code 
can factor audikw, currenly the lar ... 

Keywords: out-of-core 


11 A framework for sparse matrix code synthesis from high-level specifications 
Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali 

November 2000 Proceedings of the 2000 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^ MiAn to \so\ iMI 

pdf(l40.l8 KB) ^ Additional Information: full citation , abstract , references , citings , index terms 
Publisher Site 

We present compiler technology for synthesizing sparse matrix code from (i) dense matrix code, 
and (ii) a description of the index structure of a sparse matrix. Our approach is to embed 
statement instances into a Cartesian product of statement iteration and data spaces, and to 
produce efficient sparse code by identifying common enumerations for multiple references to 
sparse matrices. The approach works for imperfectly-nested codes with dependences, and 
produces sparse code competitive with ... 


12 Light field mapping: efficient representation and hardware rendering of surface light fields Q 
Wei-Chao Chen, Jean-Yves Bouguet, Michael H. Chu, Radek Grzeszczuk 
July 2002 ACM Transactions on Graphics (TOG) , Proceedings of the 29th annual 

conference on Computer graphics and interactive techniques, volume 21 issue 3 
Full text available: ^ pdf(7.79 MB) Additional Information: full citation , abstract , references , citings , index terms 

A light field parameterized on the surface offers a natural and intuitive description of the view- 
dependent appearance of scenes with complex reflectance properties. To enable the use of 
surface light fields in real-time rendering we develop a compact representation suitable for an 
accelerated graphics pipeline. We propose to approximate the light field data by partitioning it 
over elementary surface primitives and factorizing each part into a small set of lower- 
dimensional functions. We show th ... 


Keywords: compression algorithms, image-based rendering, rendering hardware, texture 
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13 Multimedia data indexing: A PCA-based similarity measure for multivariate time series 
Kiyoung Yang, Cyrus Shahabi 

November 2004 Pr ceedings f the 2nd ACM international w rkshop on Multimedia 
databases 

Full text available: Q pdf(207.48 KB) Additional Information: full citation , abstract , references , index terms 

Multivariate time series (MTS) datasets are common in various multimedia, medical and financial 
applications. We propose a similarity measure for MTS datasets, <i>Eros</i> <i>E</i>xtended 
F<i>ro</i>beniu<i>s</i> norm), which is based on Principal Component Analysis (PCA). 
<i>Eros</i> applies PCA to MTS datasets represented as matrices to generate principal 
components and associated eigenvalues. These principal components and eigenvalues are then 
used to ... 

Keywords: multivariate time series, nearest neighbor search, principal component analysis, 
similarity measure, singular value decomposition 


14 Performance of distributed sparse Cholesky factorization with pre-scheduling Q 
S. Venugopal, V. K. Naik, J. Saltz 

December 1992 Proceedings of the 1992 ACM/IEEE conference on Supercomputing 

Full text available: ^ pdf(978.77 KB) Additional Information: full citation , references , citings , index terms 


15 Compiling parallel code for sparse matrix applications 
Vladimir Kotlyar, Keshav Pingali, Paul Stodghill 

November 1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^ pdfd 61 .83 KB) Additional Information: full citation , abstract , references , citings 

We have developed a framework based on relational algebra for compiling efficient sparse 
matrix code from dense DO-ANY loops and a specification of the representation of the sparse 
matrix. In this paper, we show how this framework can be used to generate parallel code, and 
present experimental data that demonstrates that the code generated by our Bernoulli compiler 
achieves performance competitive with that of hand-written codes for important computational 
kernels. 

Keywords: parallelizing compilers, sparse matrix computations 


16 Can an APL workspace be used as a data base? 
Karl Soop 

June 1984 ACM SIGAPL APL Quote Quad , Proceedings of the international conference on 
APL, Volume 14 Issue 4 

Full text available: ^ pdf(91 9.35 KB) Additional Information: full citation , abstract , references , citings , index terms 

Experience from applications that use APL workspaces as data storage is reported. Different 
design decisions are discussed with illustrations of how APL is exploited. The final design, which 
achieves an utter simplicity of data representation, is described, with examples of usage. This 
simplicity allows a developer to concentrate on data manipulation, where the power of APL is at 
its best, rather than on storage techniques. 

17 A generalized envelope method for sparse factorization by rows 
Joseph W. H. Liu 

March 1991 ACM Transact! ns on Mathematical S ftware (TOMS), volume 17 issue l 

Full text available' fQpdfd .09 MB) Additional Information: full citation , abstract, references , citings , index terms . 
^ review 
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A generalized form of the envelope method is proposed for the solution of large sparse 
symmetric and positive definite matrices by rows. The method is demonstated to have practical 
advantages over the conventional column-oriented factorization using compressed column 
storage or the multifrontal method using full frontal submatrices. 

Keywords: elimination tree, envelope method, factorization by rows, sparse matrices 


18 Run-time compilation for parallel sparse matrix computations 
Cong Fu, Tao Yang 

January 1996 Proceedings of the 10th international conference on Supercomputing 

Full text available:^ pdf(981 .00 KB) Additional Information: full citation , references , citings , index terms 


19 A sub-quadratic sequence alignment algorithm for unrestricted cost matrices 
Maxime Crochemore, Gad M. Landau, Michal Ziv-Ukelson 

January 2002 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: ^ pdf(1.04 MB) Additional Information: full citation , abstract , references , citings 

The classical algorithm for computing the similarity between two sequences [36, 39] uses a 
dynamic programming matrix, and compares two strings of size n in 0(n 2 ) time. We address the 
challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use 
a scoring matrix of unrestricted weights. Our algorithm applies to both local and global 
alignment computations.The speed-up is achieved by dividing the dynamic programming ... 

20 PSBLAS: a library for parallel linear algebra computation on sparse matrices 
Salvatore Filippone, Michele Colajanni 

December 2000 ACM Transactions on Mathematical Software (TOMS), Volume 26 issue 4 

Full text available: ^ pdf(1 39.60 KB) Additional Information: full citation , abstract , references , index terms, review 

Many computationally intensive problems in engineering and science give rise to the solution of 
large, sparse, linear systems of equations. Fast and efficient methods for their soltion are very 
important because these systems usually occur in the innermost loop of the computational 
scheme. Parallelization is often necessary to achieve an acceptable level of performance. This 
paper presents the design, implementation, and interface of a library of Basic Linear Algebra 
Subroutines for sparse ... 

Keywords: basic linear algebra subprograms 
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21 Sparse LU factorization with partial pivoting on distributed memory machines 
Cong Fu, Tao Yang 

November 1996 Proceedings of the 1996 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^pdf(371.71 KB) Additional Information: full citation , abstract , references , citings , index terms 

Sparse LU factorization with partial pivoting is important to many scientific applications, but the 
effective parallelization of this algorithm is still an open problem. The main difficulty is that 
partial pivoting operations make structures of L and U factors unpredictable beforehand. This 
paper presents a novel approach called S* for parallelizing this problem on distributed memory 
machines. S* incorporates static symbolic factorization to avoid run-time control overhead and 
uses nonsymme ... 

22 GIQSS: text-source discovery over the Internet 
Luis Gravano, Hector Garda-Molina, Anthony Tomasic 

June 1999 ACM Transactions on Database Systems (TODS), Volume 24 issue 2 

Additional Information: full citation , abstract , references , citings , index terms . 
review 


Full text available: 1f| pdf(230.37 KB) 


The dramatic growth of the Internet has created a new problem for users: location of the 
relevant sources of documents. This article presents a framework for (and experimentally 
analyzes a solution to) this problem, which we call the text-source discovery problem. Our 
approach consists of two phases. First, each text source exports its contents to a centralized 
service. Second, users present queries to the service, which returns an ordered list of promising 
text sources. T ... 


Keywords: Internet search and retrieval, digital libraries, distributed information retrieval, text 
databases 


23 Compressed multi-framed signature files: an index structure for fast information retrieval Q 
Seyit Kogberber, Fazli Can 

February 1999 Proceedings of the 1999 ACM symposium on Applied computing 

Full text available: pdf(680.36 KB) Additional Information: full citation , references , index terms 


Keywords: compression, inverted files, signature files 
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Ella Bingham, Heikki Mannila 

August 2001 Pr ceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: Q pdf(592.18 KB) Additional Information: full citation , abstract , references, citings , index terms 

Random projections have recently emerged as a powerful method for dimensionality reduction. 
Theoretical results indicate that the method preserves distances quite nicely; however, empirical 
results are sparse. We present experimental results on using random projection as a 
dimensionality reduction tool in a number of cases, where the high dimensionality of the data 
would otherwise lead to burden-some computations. Our application areas are the processing of 
both noisy and noiseless images, and i ... 

Keywords: dimensionality reduction, high-dimensional data, image data, random projection, 
text document data 


25 The Multifrontal Solution of Indefinite Sparse Symmetric Linear Q 
I. S. Duff, J. K. Reid 

September 1983 ACM Transactions on Mathematical Software (TOMS), volume 9 issue 3 
Full text available: ^ pdf(1.61 MB) Additional Information: full citation , references , citings , index terms 


26 A New Implementation of Sparse Gaussian Elimination 
Robert Schreiber 

September 1982 ACM Transactions on Mathematical Software (TOMS), volume 8 issue 3 
Full text available: ^pdf(1.05 MB) Additional Information: full citation , references , citings , index terms 


27 Poster papers: Topics in 0--1 data Q 
Ella Bingham, Heikki Mannila, Jouni K. Seppanen 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge 
discovery and data mining 

Full text available: ^pdf(617.62 KB) Additional Information: full citation , abstract, references , index terms 

Large 0—1 datasets arise in various applications, such as market basket analysis and 
information retrieval. We concentrate on the study of topic models, aiming at results which 
indicate why certain methods succeed or fail. We describe simple algorithms for finding topic 
models from 0—1 data. We give theoretical results showing that the algorithms can discover the 
epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that 
the algorithms find natural topics ... 

28 Special issue on spatial database systems: Management of multidimensional discrete data Q 
Peter Baumann 

October 1994 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 3 Issue 4 

Full text available: ^| pdf(2.30 MB) Additional Information: full citation , abstract , references , citings 

Spatial database management involves two main categories of data: vector and raster data. The 
former has received a lot of in-depth investigation; the latter still lacks a sound framework. 
Current DBMSs either regard raster data as pure byte sequences where the DBMS has no 
knowledge about the underlying semantics, or they do not complement array structures with 
storage mechanisms suitable for huge arrays, or they are designed as specialized systems with 
sophisticated imaging functionality, but n ... 

Keywords: Multimedia database systems, image database systems, spatial index, tiling 
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Ove Edlund 

December 2002 ACM Transactions on Mathematical Software (TOMS), Volume 28 issue 4 
Full text available: pdf(490.01 KB) Additional Information: full citation , abstract , references , index terms 

Although there is good software for sparse QR factorization, there is little support for updating 
and downdating, something that is absolutely essential in some linear programming algorithms, 
for example. This article describes an implementation of sparse LQ factorization, including block 
triangularization, approximate minimum degree ordering, symbolic factorization, multifrontal 
factorization, and updating and downdating. The factor Q is not retained. The updating algorithm 
expands the n ... 

Keywords: Sparse matrix, downdating, orthogonal factorization, software, updating 
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Monica S. Lam, Martin C. Rinard 

April 1991 ACM SIGPLAN Notices , Proceedings of the third ACM SIGPLAN symposium on 
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31 The design of a high performance information filtering system 
Timothy A. H. Bell, Alistair Moffat 

August 1996 Proceedings of the 19th annual international ACM SIGIR conference on 
Research and development in information retrieval 
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32 MPEG-4 Video transmission over wireless networks: a link level performance study 
Ji-An Zhao, Bo Li, Chi-Wah Kok, Ishfaq Ahmad 
March 2004 Wireless Networks, Volume 10 issue 2 

Full text available: ^ pdf(3Q6.85 KB) Additional Information: full citation , abstract , references , index terms 

With the scalability and flexibility of the MPEG-4 and the emergence of the broadband wireless 
network, wireless multimedia services are foreseen to become deployed in the near future. 
Transporting MPEG-4 video over the broadband wireless network is expected to be an important 
component of many emerging multimedia applications. One of the critical issues for multimedia 
applications is to ensure that the quality-of-service (QoS) requirement to be maintained at an 
acceptable level. This is further ... 

Keywords: DBMAP with marked transitions, DBMAP/PH/1 priority queue, HMM channel, PH-type 
distribution 


33 Homomorphic factorization of BRDFs for high-performance rendering 
Michael D. McCool, Jason Ang, Anis Ahmad 

August 2001 Proceedings of the 28th annual conference on Computer graphics and 
interactive techniques 

Full text available: 1 ^ pdf(2.33 MB) Additional Information: full citation , abstract , references , citings , index terms 

A bidirectional reflectance distribution function (BRDF) describes how a material reflects light 
from its surface. To use arbitrary BRDFs in real-time rendering, a compression technique must 
be used to represent BRDFs using the available texture-mapping and computational capabilities 
of an accelerated graphics pipeline. We present a numerical technique, homomorphic 
factorization, that can decompose arbitrary BRDFs into products of two or more factors of lower 
dimensionality, each factor de ... 
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34 Implementation and computational results for the hierarchical algorithm for making sparse Q 
matrices sparser 

S. Frank Chang, S. Thomas McCormick 
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Full text available: ^pdf(1.52 MB) Additional Information: full citation , abstract , references , index terms, review 

If A is the (sparse) coefficient matrix of linear-equality constraints, for what nonsingular T is A = 
TA as sparse as possible, and how can it be efficiently computed? An efficient algorithm for this 
Sparsity Problem (SP) would be a valuable preprocessor for linearly constrained optimization 
problems. In a companion paper we developed a two-pass approach to solve SP called the 
Hierarchical Algorithm. In this paper we report on how we implem ... 

35 A column pre-ordering strategy for the unsymmetric-pattern multifrontal method Q 
Timothy A. Davis 

June 2004 ACM Transactions on Mathematical Software (TOMS), volume 30 issue 2 

Full text available: ^pdf(401.79 KB) Additional Information: full citation , abstract , references , citings , index terms 

A new method for sparse LU factorization is presented that combines a column pre-ordering 
strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The 
column ordering is selected to give a good a priori upper bound on fill-in and then refined during 
numerical factorization (while preserving the bound). Pivot rows are selected to maintain 
numerical stability and to preserve sparsity. The method analyzes the matrix and automatically 
selects one of three pre-ordering ... 

Keywords: linear equations, multifrontal method, ordering methods, sparse nonsymmetric 
matrices 
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Yi-Ming Chung, William M. Pottenger, Bruce R. Schatz 
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37 Multiresolution green's function methods for interactive simulation of large-scale 
elastostatic objects 

Doug L. James, Dinesh K. Pai 

January 2003 ACM Transactions on Graphics (TOG), volume 22 issue 1 

Full text available: ^ pdf(8.69 MB) Additional Information: full citation , abstract , references , citings , index terms 

We present a framework for low-latency interactive simulation of linear elastostatic models, and 
other systems arising from linear elliptic partial differential equations, which makes it feasible to 
interactively simulate large-scale physical models. The deformation of the models is described 
using precomputed Green's functions (GFs), and runtime boundary value problems (BVPs) are 
solved using existing Capacitance Matrix Algorithms (CMAs). Multiresolution techniques are 
introduced to control the ... 

Keywords: Capacitance matrix, Green's function, deformation, elastostatic, fast summation, 
force feedback, interactive real-time applications, lifting scheme, real-time, updating, wavelets 

38 Automatic parsing for content analysis 
Frederick J. Damerau 

June 1970 Communications of the ACM, Volume 13 issue 6 

Full text available: ^ pdf(4.07 MB) Additional Information: full citation , abstract , references , citings 
Although automatic syntactic and semantic analysis is not yet possible for all of an unrestricted 
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natural language text, some applications, of which content analysis is one, do not have such a 
stringent coverage requirement. Preliminary studies show that the Harvard Syntactic Analyzer 
can produce correct and unambiguous identification of the subject and object of certain verbs for 
approximately half of the relevant occurences. This provides a degree of coverage for content 
analysis variable ... 

Keyw rds: content analysis, information retrieval, language analysis, natural language 
processing, parsing, syntactic analysis, text processing 


39 Machine learning in automated text categorization ^ 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue i 

Full text available: ^ pdf(524.41 KB) Additional Information: full citation , abstract , references , citings , index terms 

The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of documents 
in digital form and the ensuing need to organize them. In the research community the dominant 
approach to this problem is based on machine learning techniques: a general inductive process 
automatically builds a classifier by learning, from a set of preclassified documents, the 
characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 
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conference on Management of data, volume 26 issue 2 
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Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the 
entire dataset on disk. While compression can be used to decrease the size of the dataset, 
compressed data is notoriously difficult to index or access. In this paper we consider a very large 
dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical 
value. We show how to compress such a dataset into a format that supports ad hoc querying, 
provided ... 

Results 21 - 40 of 200 Result page: previous 12345678910 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc. 
Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: H i Adobe Acrobat Q QuickTime H I Windows Media Player ^ » Real Player 


g e cf e e be 


